In [1]:
%matplotlib inline
from ggplot import *
In [2]:
ggplot(diamonds, aes(x='carat', y='price')) + geom_point() + ggtitle("Carat vs. Price")
Out[2]:
The plot above shows a scatterplot comparing a diamon's carat and the price of the diamond. The plot is composed of 3 layers:
ggplot(diamonds, aes(x='carat', y='price'))
-- This defines the dataset that's going to be plotted and the aesthetics (or instructions) to be used for defining the x and y axes.geom_point()
-- This layer tells ggplot to render a scatter plot using the aesthetics and data defined in the base layer.ggtitle("Carat vs. Price")
-- This layer applies a title to the plot. There are lots of other labels and customizations you can do to your plots (xlab
, ylab
, etc.).You can continue to add more layers to your plot as there are more things you'd like to see. For instance, if I wanted to customize the x and y axis labels, I could do so by add 2 addition layers using xlab
and ylab
.
In [3]:
ggplot(diamonds, aes(x='carat', y='price')) + \
geom_point() + \
ggtitle("Carat vs. Price") + \
xlab(" Carat\n(1 carat = 200 mg)") + \
ylab(" Price\n(2008 USD)")
Out[3]:
In addition to adding labels you can also add additional "geoms", or plot types. For instance, let's add a linear trend-line to our plot using stat_smooth
.
In [4]:
ggplot(diamonds, aes(x='carat', y='price')) + \
geom_point() + \
stat_smooth(method='lm') + \
ggtitle("Carat vs. Price") + \
xlab(" Carat\n(1 carat = 200 mg)") + \
ylab(" Price\n(2008 USD)")
Out[4]:
It looks like there are some outlying points in our plot. Let's filter out some of those rows in our dataset by using xlim
and ylim
. By adding these layers, it'll cap the x and y axes with whatever values we tell it to.
In [5]:
ggplot(diamonds, aes(x='carat', y='price')) + \
geom_point() + \
stat_smooth(method='lm') + \
ggtitle("Carat vs. Price") + \
xlab(" Carat\n(1 carat = 200 mg)") + \
ylab(" Price\n(2008 USD)") + \
xlim(0, 3) + \
ylim(0, 20000)
Out[5]:
Instead of building your ggplots with one big line of code, you can break them up into individual lines of code. To do this, use the +
or +=
operators to gradually tack on layers to your plot.
In [6]:
p = ggplot(aes(x='mpg'), data=mtcars)
p += geom_histogram()
p += xlab("Miles per Gallon")
p += ylab("# of Cars")
p
Out[6]:
In [ ]: